We are going to analyze a breast cancer section from 10x Genomicsis Visium Gene Expression platform.
For the analysis we will mainly use the R package Seurat.
You can download the data from this website or using curl, as shown below. We do not need to do it because it is already in the data folder.
# download files
curl -O https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Breast_Cancer_Block_A_Section_1/V1_Breast_Cancer_Block_A_Section_1_filtered_feature_bc_matrix.h5
curl -O https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Breast_Cancer_Block_A_Section_1/V1_Breast_Cancer_Block_A_Section_1_spatial.tar.gz
These are the libraries we are going to use:
# set seed for reproducibility purposes
set.seed(12345)
# create palette for the cell types
cell_type_palette <- as.vector(pals::polychrome())
se_obj <- Load10X_Spatial(
data.dir = "data"
)
The visium data from 10x consists is stored in a Seurat object as:
se_obj[["Spatial"]][1:5, 1:5]
## 5 x 5 sparse Matrix of class "dgCMatrix"
## AAACAAGTATCTCCCA-1 AAACACCAATAACTGC-1 AAACAGAGCGACTCCT-1
## MIR1302-2HG . . .
## FAM138A . . .
## OR4F5 . . .
## AL627309.1 . . .
## AL627309.3 . . .
## AAACAGGGTCTATATT-1 AAACAGTGTTCCTGGG-1
## MIR1302-2HG . .
## FAM138A . .
## OR4F5 . .
## AL627309.1 . .
## AL627309.3 . .
SpatialPlot(
se_obj,
pt.size = 0
) +
NoLegend()
se_obj@images$slice1@scale.factors
## $spot
## [1] 0.08250825
##
## $fiducial
## [1] 286.7012
##
## $hires
## [1] 0.08250825
##
## $lowres
## [1] 0.02475247
##
## attr(,"class")
## [1] "scalefactors"
As with single-cell objects, we have some important features that we can use to filter out bad quality spots.
plot_1 <- VlnPlot(
se_obj,
features = "nCount_Spatial",
pt.size = 0.1
) +
NoLegend()
plot_2 <- SpatialFeaturePlot(
se_obj,
features = "nCount_Spatial"
)
plot_1 | plot_2
The variability in the distribution of UMIs is related to the tissue architecture, i.e. tumoral regions are more densely packked and thus contain spots with higher counts.
plot_1 <- VlnPlot(
se_obj,
features = "nFeature_Spatial",
pt.size = 0.1
) +
NoLegend()
plot_2 <- SpatialFeaturePlot(
se_obj,
features = "nFeature_Spatial"
)
plot_1 | plot_2
Here we can also see how the number of genes per spot correlates with the structure of the tissue
We have to compute this two values by calculating the percentage of reads per spot that belong to mitochondrial/ribosomal genes.
# Mitochondrial content
se_obj[["mt.content"]] <- PercentageFeatureSet(
object = se_obj,
pattern = "^MT-")
summary(se_obj[["mt.content"]])
## mt.content
## Min. : 1.118
## 1st Qu.: 2.761
## Median : 3.504
## Mean : 4.009
## 3rd Qu.: 4.678
## Max. :14.415
# Ral contentibosomal
se_obj[["rb.content"]] <- PercentageFeatureSet(
object = se_obj,
pattern = "^RPL|^RPS")
summary(se_obj[["rb.content"]])
## rb.content
## Min. : 6.697
## 1st Qu.:10.765
## Median :11.752
## Mean :11.958
## 3rd Qu.:12.923
## Max. :21.482
SpatialFeaturePlot(
se_obj,
features = c("mt.content", "rb.content")
)
In the case of spatial data, high mitochondrial content is not an indicator of bad quality spots and thus should not be a criterion to filter out spots.
We are going to filter out genes that have no expression in the tissue.
table(rowSums(as.matrix(se_obj@assays$Spatial@counts)) == 0)
##
## FALSE TRUE
## 24923 11678
keep_genes <- rowSums(as.matrix(se_obj@assays$Spatial@counts)) != 0
se_obj <- se_obj[keep_genes, ]
Furthermore, we will filter out spots with very low number of counts (< 500) before proceeding with the downstream analysis.
se_obj <- subset(
se_obj,
subset = nCount_Spatial > 500
)
Similar to single-cell datasets, preprocessing for spatial data requires normalizing, finding variable features and scaling the counts.
se_obj <- NormalizeData(se_obj)
se_obj <- FindVariableFeatures(se_obj)
se_obj <- ScaleData(se_obj)
Prior to clustering, we perform dimensionality reduction.
se_obj <- RunPCA(se_obj)
se_obj <- RunUMAP(
se_obj,
reduction = "pca",
dims = 1:20
)
se_obj <- FindNeighbors(
se_obj,
reduction = "pca",
dims = 1:20
)
se_obj <- FindClusters(
se_obj,
resolution = 0.3
)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 3798
## Number of edges: 132786
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9245
## Number of communities: 9
## Elapsed time: 0 seconds
plot_1 <- DimPlot(
se_obj,
label = TRUE
)
plot_2 <- SpatialDimPlot(
se_obj,
) +
NoLegend()
plot_1 + plot_2
ESR1 and ERBB2 are the most common mutations in breast cancer, so one way of annotating the tissue is by looking at ESR1 and ERBB2 positive/negative regions.
SpatialFeaturePlot(
se_obj,
features = c("ESR1", "ERBB2"),
alpha = c(0.1, 1)
)
We are going to use SPOTlight in conjunction with a subset of the Tumor Immune Cell Atlas to deconvolute our spots and map immune populations to our tumor section.
# load library
library(SPOTlight)
Load downsampled version of the Tumor Immune Cell Atlas and explore the metadata we have. We are going to use annotation level 1 for the deconvolution (lv1_annot).
atlas <- readRDS("data/TICAtlas_downsample.rds")
head(atlas@meta.data)
## patient nCount_RNA nFeature_RNA percent.mt gender subtype
## MEL2_2_W462024 MEL2_2 1888 831 7.097458 unknown CM
## MEL2_2_W462047 MEL2_2 1556 734 11.053985 unknown CM
## MEL2_2_W462184 MEL2_2 1895 846 16.569921 unknown CM
## MEL2_2_W462249 MEL2_2 1672 806 8.193780 unknown CM
## MEL2_1_W462337 MEL2_1 570 310 10.701754 unknown CM
## MEL2_2_W462742 MEL2_2 1214 620 12.602965 unknown CM
## source lv1_annot lv2_annot
## MEL2_2_W462024 melanoma2 T cells naive CD4 memory stem cells
## MEL2_2_W462047 melanoma2 CD8 cytotoxic CD8 cytotoxic
## MEL2_2_W462184 melanoma2 CD8 effector memory NK
## MEL2_2_W462249 melanoma2 T cells naive CD4 memory stem cells
## MEL2_1_W462337 melanoma2 CD4 naive-memory CD4 memory stem cells
## MEL2_2_W462742 melanoma2 CD8 terminally exhausted CD4 resident effector memory
First of all we need to compute the markers for the cell types using Seurat’s funtion FindAllMarkers.
Idents(atlas) <- "lv1_annot"
sc_markers <- FindAllMarkers(
object = atlas,
assay = "RNA",
slot = "data",
only.pos = TRUE
)
sc_markers <- filter(
sc_markers,
avg_log2FC >= 0.7
)
sc_markers <- filter(
sc_markers,
str_detect(string = gene,
pattern = "^Rps|^Rpl|^Mt-",
negate = TRUE
)
)
saveRDS(sc_markers, "R_obj/filtered_atlas_markers.rds")
sc_markers <- readRDS("R_obj/filtered_atlas_markers.rds")
sc_markers %>%
group_by(cluster) %>%
top_n(5, wt = avg_log2FC)
## # A tibble: 105 x 7
## # Groups: cluster [21]
## p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
## <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <chr>
## 1 1.00e-29 1.77 0.5 0.059 1.06e-24 T cells naive FLT3LG
## 2 1.23e-27 2.08 0.62 0.107 1.31e-22 T cells naive TCF7
## 3 1.72e-21 1.68 0.74 0.196 1.83e-16 T cells naive IL7R
## 4 1.82e-19 1.63 0.46 0.079 1.93e-14 T cells naive CCR7
## 5 3.43e-12 1.66 0.64 0.23 3.64e- 7 T cells naive LTB
## 6 1.14e-20 1.77 0.76 0.202 1.21e-15 CD8 cytotoxic GZMA
## 7 1.58e-18 2.00 0.98 0.416 1.67e-13 CD8 cytotoxic CCL5
## 8 2.12e- 8 1.32 0.68 0.331 2.25e- 3 CD8 cytotoxic CD2
## 9 2.58e- 5 1.39 0.36 0.155 1 e+ 0 CD8 cytotoxic ARPC5L
## 10 2.78e- 4 1.34 0.18 0.055 1 e+ 0 CD8 cytotoxic CSF1
## # ... with 95 more rows
Run the deconvolution using the scRNAseq atlas and the spatial transcriptomics data.
decon_mtrx_ls <- spotlight_deconvolution(
se_sc = atlas,
counts_spatial = se_obj@assays$Spatial@counts,
clust_vr = "lv1_annot",
cluster_markers = sc_markers,
hvg = 0,
min_cont = 0,
assay = "RNA",
slot = "counts")
saveRDS(decon_mtrx_ls, "R_obj/deconvolution_matrix.rds")
decon_mtrx_ls <- readRDS("R_obj/deconvolution_matrix.rds")
Before even looking at the decomposed spots we can gain insight on how well the model performed by looking at the topic profiles for the cell types.
nmf_mod <- decon_mtrx_ls[[1]]
decon_mtrx <- decon_mtrx_ls[[2]]
rownames(decon_mtrx) <- colnames(se_obj)
# info on the NMF model
nmf_mod[[1]]
## <Object of class: NMFfit>
## # Model:
## <Object of class:NMFns>
## features: 2609
## basis/rank: 21
## samples: 1043
## theta: 0.5
## # Details:
## algorithm: nsNMF
## seed: NMF
## RNG: 10403L, 624L, ..., -689249108L [e0d3d05f1b721ce7d02aa5d9b936a60e]
## distance metric: 'KL'
## residuals: 988388.3
## Iterations: 2000
## Timing:
## user system elapsed
## 741.69 2.12 744.81
# deconvolution matrix
head(decon_mtrx)
## B.cells CD4.effector.memory CD4.naive.memory
## AAACAAGTATCTCCCA-1 0.000000000 4.787970e-18 0
## AAACACCAATAACTGC-1 0.109467376 3.278547e-02 0
## AAACAGAGCGACTCCT-1 0.152343572 1.493283e-02 0
## AAACAGGGTCTATATT-1 0.462152569 1.029516e-17 0
## AAACAGTGTTCCTGGG-1 0.000000000 4.114186e-02 0
## AAACATTTCCCGGATT-1 0.001411841 2.556300e-02 0
## CD4.recently.activated CD4.transitional.memory CD8.cytotoxic
## AAACAAGTATCTCCCA-1 0 0.28183238 0
## AAACACCAATAACTGC-1 0 0.27723642 0
## AAACAGAGCGACTCCT-1 0 0.50784627 0
## AAACAGGGTCTATATT-1 0 0.09591864 0
## AAACAGTGTTCCTGGG-1 0 0.17426676 0
## AAACATTTCCCGGATT-1 0 0.48668046 0
## CD8.effector.memory CD8.pre.exhausted
## AAACAAGTATCTCCCA-1 0 0.03728053
## AAACACCAATAACTGC-1 0 0.07940511
## AAACAGAGCGACTCCT-1 0 0.13425242
## AAACAGGGTCTATATT-1 0 0.05342847
## AAACAGTGTTCCTGGG-1 0 0.08483890
## AAACATTTCCCGGATT-1 0 0.07119686
## CD8.terminally.exhausted cDC Macrophages.SPP1 mDC
## AAACAAGTATCTCCCA-1 0.17089407 0.04239509 0 0
## AAACACCAATAACTGC-1 0.00000000 0.01880710 0 0
## AAACAGAGCGACTCCT-1 0.00000000 0.02648264 0 0
## AAACAGGGTCTATATT-1 0.09249216 0.06192265 0 0
## AAACAGTGTTCCTGGG-1 0.00000000 0.03689448 0 0
## AAACATTTCCCGGATT-1 0.00000000 0.02745349 0 0
## Monocytes NK pDC Plasma.B.cells
## AAACAAGTATCTCCCA-1 0 0.153720000 0.02090671 0.04565245
## AAACACCAATAACTGC-1 0 0.000000000 0.07860440 0.05336292
## AAACAGAGCGACTCCT-1 0 0.000000000 0.02741362 0.01610836
## AAACAGGGTCTATATT-1 0 0.003493694 0.07979980 0.06826405
## AAACAGTGTTCCTGGG-1 0 0.000000000 0.08817366 0.06997692
## AAACATTTCCCGGATT-1 0 0.000000000 0.07421472 0.06674091
## T.cells.naive T.cells.regulatory T.helper.cells TAMs.C1QC
## AAACAAGTATCTCCCA-1 0 0.0000000 0.000000000 0.13375576
## AAACACCAATAACTGC-1 0 0.0000000 0.233590096 0.06219759
## AAACAGAGCGACTCCT-1 0 0.0000000 0.008634505 0.04868357
## AAACAGGGTCTATATT-1 0 0.0000000 0.000000000 0.05379755
## AAACAGTGTTCCTGGG-1 0 0.1532738 0.139836908 0.11184478
## AAACATTTCCCGGATT-1 0 0.0000000 0.157483558 0.07646224
## TAMs.proinflamatory res_ss
## AAACAAGTATCTCCCA-1 0.11356301 0.4351546
## AAACACCAATAACTGC-1 0.05454352 0.5368353
## AAACAGAGCGACTCCT-1 0.06330221 0.5212597
## AAACAGGGTCTATATT-1 0.02873041 0.4487828
## AAACAGTGTTCCTGGG-1 0.09975192 0.4937794
## AAACATTTCCCGGATT-1 0.01279293 0.5570431
Look at how specific the topic profiles are for each cell type.
h <- NMF::coef(nmf_mod[[1]])
rownames(h) <- paste("Topic", 1:nrow(h), sep = "_")
topic_profile_plts <- SPOTlight::dot_plot_profiles_fun(
h = h,
train_cell_clust = nmf_mod[[2]])
topic_profile_plts[[2]]
Let’s now visualize how the deconvoluted spots on the the Seurat object.
# keep on
decon_mtrx <- decon_mtrx_ls[[2]]
decon_mtrx <- decon_mtrx[, colnames(decon_mtrx) != "res_ss"]
decon_mtrx <- decon_mtrx[, !is.na(colnames(decon_mtrx))]
se_obj@meta.data <- cbind(se_obj@meta.data, decon_mtrx)
saveRDS(se_obj, "R_obj/breast_slide_deconvoluted.rds")
We have an object with the deconvolution information already added:
se_obj <- readRDS("R_obj/breast_slide_deconvoluted.rds")
ct <- colnames(decon_mtrx)
scatterpie_plot(
se_obj = se_obj,
cell_types_all = ct,
pie_scale = 0.3
) +
coord_fixed(ratio = 1) +
scale_fill_manual(values = cell_type_palette)
## [1] "Using slice slice1"
Look at the spatial scatterpie by looking at cell types which are not present throughout the entire tissue
# keep only cell types that are present in less than 80% of the spots
keep_0.8 <- colSums(se_obj@meta.data[, ct] > 0) < 0.8 * ncol(se_obj)
# but not those that were not found on the tissue
keep_g0 <- colSums(se_obj@meta.data[, ct] > 0) > 0
# select cell types fullfiling the conditions
ct_var <- colnames(se_obj@meta.data[, ct])[keep_0.8 & keep_g0]
ct_var
## [1] "B.cells" "CD8.terminally.exhausted"
## [3] "Monocytes" "NK"
## [5] "T.cells.regulatory" "T.helper.cells"
scatterpie_plot(
se_obj = se_obj,
cell_types_all = ct_var,
pie_scale = 0.3
) +
coord_fixed(ratio = 1) +
scale_fill_manual(values = cell_type_palette)
## [1] "Using slice slice1"
Lets now see the frequencies of selected cell types across the regions in the tissue. We will focus on cytotoxic T cells and macrophages as they can give an idea of how infiltrated the tumor can be.
VlnPlot(
se_obj,
features = c("CD8.terminally.exhausted", "TAMs.proinflamatory"),
group.by = "seurat_clusters"
)
plot_1 <- SpatialPlot(
se_obj,
pt.size = 0
) +
NoLegend()
plot_2 <- SpatialDimPlot(
se_obj,
group.by = "seurat_clusters"
)
plot_1 + plot_2
We can see that in region of cluster 2 we have a presence of both exhausted T cells and proinflamatory macrophages.
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
## [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
## [5] LC_TIME=Spanish_Spain.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] imager_0.42.11 magrittr_2.0.1 cowplot_1.1.1
## [4] NMF_0.23.0 Biobase_2.53.0 BiocGenerics_0.40.0
## [7] cluster_2.1.2 rngtools_1.5.2 pkgmaker_0.32.2
## [10] registry_0.5-1 SPOTlight_0.1.7 patchwork_1.1.1
## [13] SeuratObject_4.0.4 Seurat_4.0.6 forcats_0.5.1
## [16] stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
## [19] readr_2.1.1 tidyr_1.1.4 tibble_3.1.6
## [22] ggplot2_3.3.5 tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.2 reticulate_1.22 tidyselect_1.1.1
## [4] htmlwidgets_1.5.4 grid_4.1.0 Rtsne_0.15
## [7] scatterpie_0.1.7 munsell_0.5.0 codetools_0.2-18
## [10] ica_1.0-2 future_1.23.0 miniUI_0.1.1.1
## [13] withr_2.4.3 colorspace_2.0-2 highr_0.9
## [16] knitr_1.37 rstudioapi_0.13 ROCR_1.0-11
## [19] tensor_1.5 listenv_0.8.0 labeling_0.4.2
## [22] polyclip_1.10-0 bit64_4.0.5 farver_2.1.0
## [25] parallelly_1.30.0 vctrs_0.3.8 generics_0.1.1
## [28] xfun_0.29 R6_2.5.1 doParallel_1.0.16
## [31] ggbeeswarm_0.6.0 pals_1.7 hdf5r_1.3.5
## [34] spatstat.utils_2.3-0 assertthat_0.2.1 promises_1.2.0.1
## [37] scales_1.1.1 beeswarm_0.4.0 gtable_0.3.0
## [40] Cairo_1.5-12.2 globals_0.14.0 bmp_0.3
## [43] goftest_1.2-3 rlang_0.4.12 splines_4.1.0
## [46] lazyeval_0.2.2 dichromat_2.0-0 spatstat.geom_2.3-1
## [49] broom_0.7.10 yaml_2.2.1 reshape2_1.4.4
## [52] abind_1.4-5 modelr_0.1.8 backports_1.4.1
## [55] httpuv_1.6.4 tools_4.1.0 gridBase_0.4-7
## [58] ellipsis_0.3.2 spatstat.core_2.3-2 jquerylib_0.1.4
## [61] RColorBrewer_1.1-2 ggridges_0.5.3 Rcpp_1.0.7
## [64] plyr_1.8.6 rpart_4.1-15 deldir_1.0-6
## [67] pbapply_1.5-0 zoo_1.8-9 haven_2.4.3
## [70] ggrepel_0.9.1 fs_1.5.2 data.table_1.14.2
## [73] RSpectra_0.16-0 scattermore_0.7 readbitmap_0.1.5
## [76] lmtest_0.9-39 reprex_2.0.1 RANN_2.6.1
## [79] fitdistrplus_1.1-6 matrixStats_0.61.0 hms_1.1.1
## [82] mime_0.12 evaluate_0.14 xtable_1.8-4
## [85] jpeg_0.1-9 readxl_1.3.1 gridExtra_2.3
## [88] compiler_4.1.0 maps_3.4.0 KernSmooth_2.23-20
## [91] crayon_1.4.2 htmltools_0.5.2 ggfun_0.0.4
## [94] mgcv_1.8-38 later_1.3.0 tzdb_0.2.0
## [97] tiff_0.1-10 lubridate_1.8.0 DBI_1.1.2
## [100] tweenr_1.0.2 dbplyr_2.1.1 MASS_7.3-54
## [103] Matrix_1.4-0 cli_3.1.0 parallel_4.1.0
## [106] igraph_1.2.10 pkgconfig_2.0.3 plotly_4.10.0
## [109] spatstat.sparse_2.1-0 xml2_1.3.3 foreach_1.5.1
## [112] vipor_0.4.5 bslib_0.3.1 rvest_1.0.2
## [115] digest_0.6.29 sctransform_0.3.2 RcppAnnoy_0.0.19
## [118] spatstat.data_2.1-2 rmarkdown_2.11 cellranger_1.1.0
## [121] leiden_0.3.9 uwot_0.1.11 shiny_1.7.1
## [124] lifecycle_1.0.1 nlme_3.1-153 jsonlite_1.7.2
## [127] mapproj_1.2.7 viridisLite_0.4.0 fansi_0.5.0
## [130] pillar_1.6.4 lattice_0.20-45 ggrastr_1.0.1
## [133] fastmap_1.1.0 httr_1.4.2 survival_3.2-13
## [136] glue_1.6.0 png_0.1-7 iterators_1.0.13
## [139] bit_4.0.4 ggforce_0.3.3 stringi_1.7.6
## [142] sass_0.4.0 irlba_2.3.5 future.apply_1.8.1